link prediction model
Below, we address the main questions and concerns that were raised in the reviews
We thank the reviewers for their thoughtful comments and suggestions. We will incorporate them in our revised version. Below, we address the main questions and concerns that were raised in the reviews. This is a great suggestion. Table 1 compares the training time for all of the models on the particle physics experiment.
Skill Discovery for Software Scripting Automation via Offline Simulations with LLMs
Xu, Paiheng, Wu, Gang, Chen, Xiang, Yu, Tong, Xiao, Chang, Dernoncourt, Franck, Zhou, Tianyi, Ai, Wei, Swaminathan, Viswanathan
Scripting interfaces enable users to automate tasks and customize software workflows, but creating scripts traditionally requires programming expertise and familiarity with specific APIs, posing barriers for many users. While Large Language Models (LLMs) can generate code from natural language queries, runtime code generation is severely limited due to unverified code, security risks, longer response times, and higher computational costs. To bridge the gap, we propose an offline simulation framework to curate a software-specific skillset, a collection of verified scripts, by exploiting LLMs and publicly available scripting guides. Our framework comprises two components: (1) task creation, using top-down functionality guidance and bottom-up API synergy exploration to generate helpful tasks; and (2) skill generation with trials, refining and validating scripts based on execution feedback. To efficiently navigate the extensive API landscape, we introduce a Graph Neural Network (GNN)-based link prediction model to capture API synergy, enabling the generation of skills involving underutilized APIs and expanding the skillset's diversity. Experiments with Adobe Illustrator demonstrate that our framework significantly improves automation success rates, reduces response time, and saves runtime token costs compared to traditional runtime code generation. This is the first attempt to use software scripting interfaces as a testbed for LLM-based systems, highlighting the advantages of leveraging execution feedback in a controlled environment and offering valuable insights into aligning AI capabilities with user needs in specialized software domains.
Transfer Learning for Temporal Link Prediction
Chatterjee, Ayan, Ikica, Barbara, Ravandi, Babak, Palowitch, John
Link prediction on graphs has applications spanning from recommender systems to drug discovery. Temporal link prediction (TLP) refers to predicting future links in a temporally evolving graph and adds additional complexity related to the dynamic nature of graphs. State-of-the-art TLP models incorporate memory modules alongside graph neural networks to learn both the temporal mechanisms of incoming nodes and the evolving graph topology. However, memory modules only store information about nodes seen at train time, and hence such models cannot be directly transferred to entirely new graphs at test time and deployment. In this work, we study a new transfer learning task for temporal link prediction, and develop transfer-effective methods for memory-laden models. Specifically, motivated by work showing the informativeness of structural signals for the TLP task, we augment a structural mapping module to the existing TLP model architectures, which learns a mapping from graph structural (topological) features to memory embeddings. Our work paves the way for a memory-free foundation model for TLP.
Can GNNs Learn Link Heuristics? A Concise Review and Evaluation of Link Prediction Methods
Liang, Shuming, Ding, Yu, Li, Zhidong, Liang, Bin, Zhang, Siqi, Wang, Yang, Chen, Fang
This paper explores the ability of Graph Neural Networks (GNNs) in learning various forms of information for link prediction, alongside a brief review of existing link prediction methods. Our analysis reveals that GNNs cannot effectively learn structural information related to the number of common neighbors between two nodes, primarily due to the nature of set-based pooling of the neighborhood aggregation scheme. Also, our extensive experiments indicate that trainable node embeddings can improve the performance of GNN-based link prediction models. Importantly, we observe that the denser the graph, the greater such the improvement. We attribute this to the characteristics of node embeddings, where the link state of each link sample could be encoded into the embeddings of nodes that are involved in the neighborhood aggregation of the two nodes in that link sample. In denser graphs, every node could have more opportunities to attend the neighborhood aggregation of other nodes and encode states of more link samples to its embedding, thus learning better node embeddings for link prediction. Lastly, we demonstrate that the insights gained from our research carry important implications in identifying the limitations of existing link prediction methods, which could guide the future development of more robust algorithms.
Reviews: Poincaré Embeddings for Learning Hierarchical Representations
Summary The paper proposes a link prediction model that embeds symbols in a hyperbolic space using Poincaré embeddings. In this space, tree structures can more easily be represented as the distance to points increases exponentially w.r.t. The paper is motivated and written well. Furthermore, the presented method is intriguing and I believe it will have a notable impact on link prediction research. My concerns are regarding the comparison to state-of-the-art link prediction and how the method performs if the assumption about a hierarchy in the data is dropped.
SnapE -- Training Snapshot Ensembles of Link Prediction Models
Snapshot ensembles have been widely used in various fields of prediction. They allow for training an ensemble of prediction models at the cost of training a single one. They are known to yield more robust predictions by creating a set of diverse base models. In this paper, we introduce an approach to transfer the idea of snapshot ensembles to link prediction models in knowledge graphs. Moreover, since link prediction in knowledge graphs is a setup without explicit negative examples, we propose a novel training loop that iteratively creates negative examples using previous snapshot models. An evaluation with four base models across four datasets shows that this approach constantly outperforms the single model approach, while keeping the training time constant.
Disentangled Condensation for Large-scale Graphs
Xiao, Zhenbang, Liu, Shunyu, Wang, Yu, Zheng, Tongya, Song, Mingli
Graph condensation has emerged as an intriguing technique to provide Graph Neural Networks for large-scale graphs with a more compact yet informative small graph to save the expensive costs of large-scale graph learning. Despite the promising results achieved, previous graph condensation methods often employ an entangled condensation strategy that involves condensing nodes and edges simultaneously, leading to substantial GPU memory demands. This entangled strategy has considerably impeded the scalability of graph condensation, impairing its capability to condense extremely large-scale graphs and produce condensed graphs with high fidelity. Therefore, this paper presents Disentangled Condensation for large-scale graphs, abbreviated as DisCo, to provide scalable graph condensation for graphs of varying sizes. At the heart of DisCo are two complementary components, namely node and edge condensation modules, that realize the condensation of nodes and edges in a disentangled manner. In the node condensation module, we focus on synthesizing condensed nodes that exhibit a similar node feature distribution to original nodes using a pre-trained node classification model while incorporating class centroid alignment and anchor attachment regularizers. After node condensation, in the edge condensation module, we preserve the topology structure by transferring the link prediction model of the original graph to the condensed nodes, generating the corresponding condensed edges. Based on the disentangled strategy, the proposed DisCo can successfully scale up to the ogbn-papers100M graph with over 100 million nodes and 1 billion edges with flexible reduction rates. Extensive experiments on five common datasets further demonstrate that the proposed DisCo yields results superior to state-of-the-art counterparts by a significant margin. The source code is available at https://github.com/BangHonor/DisCo.
Disentangling Node Attributes from Graph Topology for Improved Generalizability in Link Prediction
Chatterjee, Ayan, Walters, Robin, Menichetti, Giulia, Eliassi-Rad, Tina
Link prediction is a crucial task in graph machine learning with diverse applications. We explore the interplay between node attributes and graph topology and demonstrate that incorporating pre-trained node attributes improves the generalization power of link prediction models. Our proposed method, UPNA (Unsupervised Pre-training of Node Attributes), solves the inductive link prediction problem by learning a function that takes a pair of node attributes and predicts the probability of an edge, as opposed to Graph Neural Networks (GNN), which can be prone to topological shortcuts in graphs with power-law degree distribution. In this manner, UPNA learns a significant part of the latent graph generation mechanism since the learned function can be used to add incoming nodes to a growing graph. By leveraging pre-trained node attributes, we overcome observational bias and make meaningful predictions about unobserved nodes, surpassing state-of-the-art performance (3X to 34X improvement on benchmark datasets). UPNA can be applied to various pairwise learning tasks and integrated with existing link prediction models to enhance their generalizability and bolster graph generative models.
Holder Recommendations using Graph Representation Learning & Link Prediction
Saxena, Rachna, Kumar, Abhijeet, Mishra, Mridul
Lead recommendations for financial products such as funds or ETF is potentially challenging in investment space due to changing market scenarios, and difficulty in capturing financial holder's mindset and their philosophy. Current methods surface leads based on certain product categorization and attributes like returns, fees, category etc. to suggest similar product to investors which may not capture the holder's investment behavior holistically. Other reported works does subjective analysis of institutional holder's ideology. This paper proposes a comprehensive data driven framework for developing a lead recommendations system in holder's space for financial products like funds by using transactional history, asset flows and product specific attributes. The system assumes holder's interest implicitly by considering all investment transactions made and collects possible meta information to detect holder's investment profile/persona like investment anticipation and investment behavior. This paper focusses on holder recommendation component of framework which employs a bi-partite graph representation of financial holders and funds using variety of attributes and further employs GraphSage model for learning representations followed by link prediction model for ranking recommendation for future period. The performance of the proposed approach is compared with baseline model i.e., content-based filtering approach on metric hits at Top-k (50, 100, 200) recommendations. We found that the proposed graph ML solution outperform baseline by absolute 42%, 22% and 14% with a look ahead bias and by absolute 18%, 19% and 18% on completely unseen holders in terms of hit rate for top-k recommendations: 50, 100 and 200 respectively.
Link-Backdoor: Backdoor Attack on Link Prediction via Node Injection
Zheng, Haibin, Xiong, Haiyang, Ma, Haonan, Huang, Guohan, Chen, Jinyin
Link prediction, inferring the undiscovered or potential links of the graph, is widely applied in the real-world. By facilitating labeled links of the graph as the training data, numerous deep learning based link prediction methods have been studied, which have dominant prediction accuracy compared with non-deep methods. However,the threats of maliciously crafted training graph will leave a specific backdoor in the deep model, thus when some specific examples are fed into the model, it will make wrong prediction, defined as backdoor attack. It is an important aspect that has been overlooked in the current literature. In this paper, we prompt the concept of backdoor attack on link prediction, and propose Link-Backdoor to reveal the training vulnerability of the existing link prediction methods. Specifically, the Link-Backdoor combines the fake nodes with the nodes of the target link to form a trigger. Moreover, it optimizes the trigger by the gradient information from the target model. Consequently, the link prediction model trained on the backdoored dataset will predict the link with trigger to the target state. Extensive experiments on five benchmark datasets and five well-performing link prediction models demonstrate that the Link-Backdoor achieves the state-of-the-art attack success rate under both white-box (i.e., available of the target model parameter)and black-box (i.e., unavailable of the target model parameter) scenarios. Additionally, we testify the attack under defensive circumstance, and the results indicate that the Link-Backdoor still can construct successful attack on the well-performing link prediction methods. The code and data are available at https://github.com/Seaocn/Link-Backdoor.